The prediction of future from the past: an old problem from a modern perspective 
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The idea of predicting the future from the knowledge of the past is quite natural when dealing 
with systems whose equations of motion are not known. Such a long-standing issue is revisited in 
the light of modern ergodic theory of dynamical systems and becomes particularly interesting from a 
pedagogical perspective due to its close link with Poincare's recurrence. Using such a connection, a 
very general result of ergodic theory - Kac's lemma - can be used to establish the intrinsic limitations 
to the possibility of predicting the future from the past. In spite of a naive expectation, predictability 
results to be hindered rather by the effective number of degrees of freedom of a system than by the 
presence of chaos. If the effective number of degrees of freedom becomes large enough, regardless 
the regular or chaotic nature of the system, predictions turn out to be practically impossible. The 
discussion of these issues is illustrated with the help of the numerical study of simple models. 



I. INTRODUCTION 

Predicting the future state of a system had always been 
a natural motivation for science development, with appli- 
cations such as weather forecasting and tidal prediction. 
Understanding the limitations to the predictability of a 
system evolution is therefore crucial. 

In deterministic systems, where the future is uniquely 
determined by the present, two main approaches to the 
predictability problem can be addressed. The first refers 
to systems whose evolution laws are known, either in 
terms of differential or difference equations. In this case, 
the predictability is mainly limited by the presence of sen- 
sitivity to initial conditions (deterministic chaos), which 
as taught in dynamical system courses, is characterized 
by the Lyapunov exponent. The second approach refers 
to phenomena whose governing laws are not known, but 
whose evolution can be measured and recorded. In such 
a case, the best practical strategy is to use the past, as 
a full-scale model of the system, to make predictions on 
the future evolution. 

The present paper discusses at an introductory level 
the latter method, which was developed in the frame- 
work of nonlinear time series analysis. '^"'^ This topic is 
seldom included in basic courses and is closely related 
to an apparently distant classical theme — the Poincare 
recurrences.'* Surprisingly, although simple to establish,^ 
such a connection has been overlooked even by special- 
ists, as recently remarked by Altmann and Kantz.^ Such 
a link also allows us to clarify the practical role of theoret- 
ical concepts as the attractor dimension of a dynamical 
system. Indeed, as we shall see, when the evolution laws 
are unknown, the actual constraints to our prediction ca- 
pabilities are rather set by the number of degrees of free- 
dom (attractor dimension) than by the presence of chaos. 
This fact is often overlooked in favor of the widespread 
folklore of the butterfly effect."^ In this respect, it is im- 
portant to stress that such limitations to predictability 
are a consequence of rather general results of ergodic 
and dynamical-system theory. Although the main ideas 
had been already put forward by Boltzmann,^ many mis- 



guided applications of nonlinear time series analysis ap- 
peared in the literature after the rediscovery of chaos 
(see, e.g., Ref. 9). 

Likely, one of the main reasons for excluding this 
topic from basic courses is the necessity to introduce ad- 
vanced technical tools^ as, for example, the embedding 
technique.^' Therefore, here, we present the problem in 
its simplest formulation. Often, when recording the evo- 
lution of a system with unknown dynamics, not all the 
variables necessary to identify the states or even their 
number are known. Moreover, if luckily we know them, 
we can access only one or a few scalar functions of them, 
typically affected by measurement errors. Throughout 
this paper, we will disregard all these technical difficulties 
(which can be to a large extent handled with specific tech- 
niques^) and assume that the necessary variables can be 
recorded with arbitrary precision. Even with such ideal 
working hypothesis, the above mentioned fundamental 
constraints to predictability are unavoidable. 

The material is organized as follows. In Sect. II, after 
some historical notes, we introduce the method of ana- 
logues as the simplest procedure to predict the future 
from past time series. Sect. Ill introduces the model 
system used to clarify the main issues. In Sect. IV, 
we discuss the link between analogues and Poincare re- 
currences, and show how the actual limitations to pre- 
dictability from data stems from the effective number of 
degrees of freedom. Sect. V discusses two cases where 
the method works successfully, one is illustrated by a nu- 
merical example and the other refers to the important 
practical problem of tidal predictions. Finally, Sect. VI 
is devoted to conclusions. 



II. THE METHOD OF ANALOGUES 

"If a system behaves in a certain way, it will do again" 
seems a rather natural claim when referred, for instance, 
to the diurnal and seasonal cycles, also supported by bib- 
lical tradition: What has been will be again, what has 
been done will be done again; there is nothing new under 
the sun [the Qohelet's Book 1:9 NIV]. This idea, together 



2 



with the behef in determinism (from the same antecedents 
follow the same consequents) , is at the basis of prediction 
methods. However, as Maxwell argued:^^ It is a meta- 
physical doctrine that from the same antecedents follow 
the same consequents. [. . .] But it is not of much use in a 
world like this, in which the same antecedents never again 
concur, and nothing ever happens twice.[. . .] The physical 
axiom which has a somewhat similar aspect is "That from 
like antecedents follow like consequents. " These words 
no more surprise the scientists, aware, by now, of the 
almost exceptional character of periodic behaviors and 
of the ubiquitous presence of irregular evolutions due to 
deterministic chaos; but at that time they constituted a 
rupture with the tradition. 

In spite of Maxwell authoritative opinion, until World 
War I, weather forecasters substantially used empirical 
implementations of the naive idea, exploiting their expe- 
rience and memory of past similar "patterns" (roughly 
surfaces of discontinuity between warm and cold air 
masses) to produce weather map predictions.^'^ In the 
preface to his seminal book Weather Prediction by Nu- 
merical Process, Richardson criticizes the empirical ap- 
proaches and, through an argument similar to that by 
Maxwell, -'^^ contends that for weather forecasting it is 
much more useful integrating the partial differential 
(namely the thermo-hydrodynamical) equations ruling 
the atmosphere. Although, as history witnessed, the 
successful approach to predictions is that foreseen by 
Richardson, it is interesting to discuss the range of ap- 
plicability of predictions based on the past evolution of 
a deterministic system. 

A mathematical formulation of the idea was due to 
Lorenz and it is called method of analogues, ^'^•^^ which 
can be considered as the most straightforward approach 
to predictability in the absence of a detailed knowledge 
of the physical laws. 

In its simplest description, the method works as fol- 
lows. Assume that the known state x{t) of a process 
can be sampled at times tk — kAt with arbitrary pre- 
cision. The sampling interval At is also assumed to be 
arbitrary but not too short. We collect the sequence of 
states Xk — x{tk) with fc = 1, . . . M. If from the present 
state Xm, we would like to forecast the future xm+t at 
time tM+T {T > 1), the basic idea is to search in the 
past (xi, X2, . ■ . , Xm-i) that state, say Xi^, most similar 
to Xm, and to use its consequents as proxies for the fu- 
ture evolution of xm- Mathematically, we require that 
\xk — xm\ < e, and we dub x^ a e- analogue to xm- If 
the analogue were perfect (e = 0) the system (being de- 
terministic) would be surely periodic and the prediction 
trivial: XM-i-T = Xk+r for any T. If it were not perfect 
(e > 0), we could use the forecasting recipe 



When more than one analogue can be found, the gen- 
eralization of (1) is obvious, see Fig. lb. 
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FIG. 1: (Color online) Sketch of the method of analogues: (a) 
illustration of Eq. (1) and of the error growth; (b) generaliza- 
tion of the method to more than one analogue. In particular, 
if Na analogues, {xk„}n^i are found (1) can be replaced by 
xm+t = ^^^i EnXfe^+T where the matrices En can be com- 
puted by suitable interpolations. 

Once a "good" analogue (meaning e reasonably small) 
has been found, the next step is to determine the ac- 
curacy of the prediction (1), namely the difference be- 
tween the forecast and the actual state |xm+t — xm+t\- 
In practice, the e-analogue is the present state with an 
uncertainty, x^ = xm + So {So < e), and the predic- 
tion (1) can be considered acceptable until the error 
5t = \xm+t — Xk+r] remains below a tolerance A, dic- 
tated by the practical needs. The predictability time 
T = T{So,A) is then defined by requiring St A for 
T <f. 

Accuracy and predictability time T are clearly related 
to (possible) sensitivity to initial conditions, as pioneered 
by Lorenz himself.^^ As taught in basic dynamical sys- 
tem courses, chaotic evolutions exponentially amplify an 
infinitesimal error: 



Soe 



AiT 



(2) 



Ai being the maximal Lyapunov exponent.^'' For a gen- 
tle introduction to Lyapunov exponents the reader may 
refer to Ref. 18. Therefore, given a good analogue the 
prediction will be A-accurate up to a time 
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(3) 



Xm+T — Xk+T ; 



(1) 



as from like antecedents follow like consequents (see 
Fig. la). For the prediction (1) to be meaningful, the 
analogue Xk must not be a near-in-time antecedent. 



Strictly speaking, for the above equation to be valid, both 
So and A must be very small. ""^^ It is worth remarking 
that the evaluation of the error growth rate (2) provides, 
at least in principle, a way to determine the Lyapunov 
exponent Ai from a long time series. 
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Conversely, deterministic non-chaotic systems are less 
sensitive to initial conditions: the error grows polynomi- 
ally in time, and usually, T{So,A} results to be longer 
than that of chaotic systems, making long term predic- 
tions possible. 

For those familiar with chaotic systems, we apparently 
reached the obvious conclusion that the main limit to 
predictions based on analogues is the sensitivity to ini- 
tial conditions, typical of chaos. But, as realized by 
Lorenz himself, the main issue is to find good (small e) 
analogues: In practice this procedure may be expected 
to fail, because of the high probability that no truly good 
analogues will be found within the recorded history of the 
atmosphere. He also pointed out that the very drawback 
of the method stems from the need of a very large data 
set,^'' independently of the presence of chaos. 

It is worth concluding this historical presentation with 
a brief comment on the application of the method of 
analogues in the original Lorenz's work.^^ Lorenz was 
strongly supporting weather forecasting based on solving 
the (approximate) equations of the atmosphere, as out- 
lined by Richardson. He realized that the intrinsic limits 
to weather forecasting cannot be established by estimat- 
ing the intrinsic error growth of these solutions. This 
work represents the first attempt to estimate the Lya- 
punov exponent from data, pioneering the modern time 
series analysis. Unfortunately, he also realized that, the 
true Lyapunov exponent of the atmosphere cannot be es- 
timated from data, as good analogues cannot be found 
and the difference between mediocre analogues may be ex- 
pected to amplify more slowly than the difference between 
good analogues, since the non linear effects play a greater 
role when the errors are large. 



III. STUDY OF A SIMPLE MODEL 

The difficulties in finding good analogues can be quan- 
tified by studying analogue statistics. As an illustrative 
example, we compute numerically the probability of find- 
ing e-analogues to a state in a simple model system intro- 
duced by Lorenz in 1996,^° hence called Lorenz-96 model. 
It consists of the following nonlinearly coupled ordinary 
differential equations 

=Xn-l{Xn+l—Xn-2) — Xn + F , (4) 

where n = 1, . . . ,N and periodic boundary conditions 
{XN±n — X±n) are assumed. The variables X„ may 
be thought of as the values of some atmospheric rep- 
resentative observable along the latitude circle, so that 
Eq. (4) can be regarded as a one-dimensional carica- 
ture of atmospheric motion. The quadratic coupling 
conserves energy, J2n-^n- presence of forcing F 

and damping — X„, the energy is only statistically con- 
served. The motion is thus confined to a bounded region 
of M^. Moreover, dissipation constraints the trajectories 



to evolve onto a subset of this region possibly with di- 
mension < N, namely an attractor (fixed points, limit 
cycles or a strange attractor if the dynamics is chaotic). 
The dynamical features are completely determined by the 
forcing strength F and by the system dimensionality A''. 
In particular, for F > 8/9 and N > 4 the system dis- 
plays chaos with exponential separation of nearby initial 
conditions. 

In principle, the statistics of the analogues of system 
(4) can be determined according to the following proce- 
dure. Given a state of the system xm on the attractor, we 
have to consider its precursors {xi, . . . , xm-i) along the 
trajectory ending in xm sampled at regular time intervals 
of duration At, Xi = x{ti = iAt). Hence, the e-analogues 
of Xm arc those states Xj such that \xj — Xm\ < e. Fi- 
nally, the fraction of e-analogues, 

^ M-l 

CM{e) = j^—^^Q{e-\xj-XM\), (5) 

provides an estimate of the probability to find e- 
analogues to xm as a function of both the desired de- 
gree of similarity e and the length of the history M we 
recorded. Being interested in typical behaviors and not 
just in the properties around a specific state Xm, it is 
convenient to average Cm{^) over r independent refer- 
ence states. Therefore, instead of considering only the 
end point xm, we select r states {a;^}^^;^ along the tra- 
jectory, well spaced in time to be considered independent 
configurations on the attractor, and we replace (5) by the 
average fraction of e-analogues 

r M 

a,M(e) = ^ E E - - ^*^\) ■ (6) 

As in our case we know the evolution laws (4), it is not 
really necessary to look at the backward time series of the 
reference states. In practice, we can select the {x^Yf.^^ 
and look at their forward e-analogues. 

The latter procedure is used to produce Fig. 2, where 
we show Cr,M{e) obtained with r = 10^ reference states 
and different lengths Ad of the time series, from 10"^ 
to 10^. Of course, when the degree of similarity e be- 
comes larger than the attractor size, say emax, the frac- 
tion C,-,M(e) saturates to 1. Therefore, it is meaningful 
to normalize the degree of similarity by emax- As for the 
dynamics (4), the forcing is fixed to F = 5 and we con- 
sider two system sizes A = 20 and A^ = 21. In both cases 
the system is chaotic. While for A = 21 analogues can be 
found with reasonable probability even for small values of 
£ 10~"^emax), for A^ = 20 analogues are found only for 
large values of e(> 10~^ei„ax), even for M = 10^. The 
solid lines in Fig. 2 indicate that for e <C emax the proba- 
bility to find an analogue is fairly well approximated by 
a power law 

a,M(e) oc e^^ . (7) 
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In particular, we find Da — 3.1 and Da — 6.6 for TV = 21 
and N — 20, respectively. Therefore, the exponent Da 
quantifies the difference between the two cases: upon 
lowering e, the probability to find e-analogues with N — 
20 becomes about e'^-^ times smaller than with N — 21. 

The probability to find e-analogues is expected to de- 
crease upon increasing the number of degrees of freedom 
iV, as more constraints on the single components of the 
state vector should be satisfied. In this perspective, the 
above result seems at odds with intuition unless the ex- 
ponent Da in (7) is interpreted as the "effective" number 
of degrees of freedom. 

We end this section by warning the reader that the 
counter-intuitive inequality Da{N = 21) < Da{N = 20) 
is a peculiar consequence of the choice of the parameters 
F and N ."^^ Generally, Da is expected to increase with 
N .^"^ Here, we made this choice to emphasize the impor- 
tance of the effective number of degrees of freedom that, 
in general, is not trivially related to (and can be much 
smaller than) the number of variables N . As we shall 
see in the next section. Da is nothing but the attractor 
dimension, a measure of the effective number of degrees 
of freedom. 



IV. DEGREES OF FREEDOM, RECURRENCE 
TIMES AND ANALOGUES 

In this Section we recall some basic notions of ergodic 
dynamical systems and underline their connections with 
the analogues. In particular, we link the difhculty of find- 
ing analogues to the presence of long recurrence times. 



A. The role of dimensions 

The founding principle of ergodic theory is that 
the long-time statistical properties of a system can be 
equivalently described in terms of the invariant (time- 
independent) probability, /j,, such that ^(a) is the prob- 
ability of finding the system in any specified region a of 
its phase space. The phase space of a system described 
by N degrees of freedom is a region of M^, that is a 
iV-dimensional space. 

If the evolution conserves phase-space volumes (as in 
the Hamiltonian motion of classical systems) then the 
probability dfi{x) of finding the state in a small region 
of volume dV, as defined in elementary geometry, around 
X is proportional to dV, i.e. to the Lebesgue measure of 
that region. In dissipative systems, phase-space volumes 
are contracted on average and the invariant probability 
d^{x) is not proportional to dV, but concentrates on 
a set (the attractor) A C of dimension Da < N. 
Slightly more formally, the dimension Da describes the 
small scale {i <g; 1) behavior of the probability fJ,{By (£)) 
of finding points x G A which are in the A^-dimensional 



sphere of radius £ around y: 

KByie))-f d^,{x)^^^-. (8) 

Therefore, the trajectories of dissipative systems are ef- 
fectively described by a number Da < N oi degrees of 
freedom, though defined in a A^-dimensional space. 

For a non-integer Da attractor and probability are said 
to be fractal. In general, attractors are non-homogeneous 
with Da, in Eq. (8), depending on y, and an infinite set 
of dimensions is needed to fully characterize the invariant 
probability — we speak of multifractal objects. For the 
sake of our discussion, these technical complications can 
be ignored, and the attractor can be assumed homoge- 
neous and characterized by a single dimension Da- 

Upon reconsidering CM(e) defined in Eq. (5), we see 
that it is nothing but the fraction of time the trajec- 
tory spends in a sphere or radius e centered in xm- For 
large M, as a consequence of ergodicity, CA/(e) gives the 
probability of finding the system in that sphere, and the 
quantity (6) is an averaged probability. Therefore, for 
sufficiently large M and small e, Eq. (8) implies 

Cr,M{e)^{^l{e))-e^- . (9) 

Strictly speaking, in Eq. (9) the right exponent should be 
the correlation dimension D2, which controls the small 
scale asymptotics of the probability to find two points on 
the attractor at distance < e.^^'^^ Thanks to the homo- 
geneity assumption, however, we have D2 — Da- 
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FIG. 2: (Color online) a,M(e) vs. e/tmax for = 5, = 20 
and N = 21; the reference states are r = 1000 and different 
values of M ranging from 10^ to 10^ are considered. The solid 
lines are the fits of the data by means of relation (9). 

Relation (9) links the observed behavior (7) in Fig. 2 
to the attractor dimension, showing that the limiting fac- 
tor to find good analogues is the attractor dimension, 
which quantifies the number of "active" degrees of free- 
dom of the system. For those accustomed to chaotic 
systems, this result is rather obvious as Cr.,M(e) in (6) 
provides a standard approximation'^ to the correlation 
sum, 2/(M(M- l))Ejj>i0(e- - a'jl): at the basis 



5 



of the Grassbcrgcr and Procaccia method to determine 
the correlation dimension D2.^^ Indeed, the correlation 
sum is an unbiased estimator of the probability P2(e) to 
find two randomly chosen points on the attractor (using 
a long trajectory on it) at a distance < e. For small e, 
P2(e) ~ e^^ and thus the correlation dimension can be 
estimated. 



B. Poinccire recurrence theorem and Kac's lemma 

The quantity CM(e), besides approaching (for large 
M) the probability to find the system state e-close to 
xm, relates to the average time interval r^j between two 
consecutive e-analogues of xm, which is given by 



(M - l)At 
M{e) 



(10) 



M.{e) being the number of e-analogues in the interval 
[ti : tM-i]- As by definition Cuie) = M{e)/{M - 1) we 
have 



(11) 



actually this is a classical result of the ergodic theory — 
known as Kac's lemma. 

To clarify this connection, it is worth recalling the 
Poincare recurrence theorem* stating that, in Hamilto- 
nian systems with a bounded phase space 17, the trajec- 
tories exiting from a generic set cr C f2 will return back 
to £7 infinitely many times. The theorem holds for al- 
most all points in a except for a possible subset of zero 
probability. In general, it applies to the class of sys- 
tems with volume-preserving dynamics in phase-space, of 
which Hamiltonian ones are a particular sub-class. Actu- 
ally, although often not stressed in elementary courses, it 
can be straightforwardly extended to dissipative ergodic 
systems provided one only considers initial conditions on 
the attractor and "zero probability" is interpreted with 
respect to the invariant probability on the attractor. 

Poincare theorem merely proves that a trajectory 
surely return to the neighborhood of its starting point, 
but does not provide information about the time between 
two consecutive recurrences — the Poincare recurrence 
time. The latter is crucial to the method of analogues 
because long recurrence times critically spoil its applica- 
bility (see Eq. (11)). 

To estimate the average recurrence time, let us assume 
that an infinitely long sequence of states can be stored. 
Without loss of generality, we consider a discrete time 
sequence = x{kAt) (fc = 0, . . . , oo) of states generated 
by a deterministic evolution from the initial condition 
Xq. Given a set a including Xq, the recurrence time of 
Xq relative to a, Ta{xo), can be defined as the minimum 
k such that Xk is again in a 



Taixo) = inf{A: > l|a;o 

k 



'■ a and a;^ G a} , (12) 



note that we are using dimensionless times with At = 1. 
The mean recurrence time relative to a, (to-), can then 
be computed as 



'"^ " j '^^('^)^<^('^) ' (12) 



fjb being the invariant probability with respect to the dy- 
namics, defined in the previous subsection. For ergodic 
dynamics, a classical result known as Kac's Lemma states 
that:24 



djjL{x)T„{x) = 1 SO that (r^) = l/iJ,{a) , (14) 



namely the average recurrence time to a region a is just 
the inverse of the probability of that region. We stress 
that (14) is a straightforward consequence of ergodicity.^^ 
In a system with phase-space volume preservation 
(those for which the Poincare theorem is typically in- 
voked) with N degrees of freedom, if a is an hypercube 
of linear size e one has 



M(a) ^ (1) 



N 



and 



<.,~^(f- 



(15) 



where L is the typical excursion of each component of x. 
Thus the mean return time exponentially grows with N. 
Consequently in a macroscopic body (A^ >• 1), {t^) is 
astronomically large, for any a. The result (15) is surely 
positive for the validity of statistical mechanics, as recog- 
nized by Boltzmann himself who (without knowing Kac's 
Lemma) replied to Zermelo criticism to irreversibility: Of 
course if one waits long enough, the initial state will even- 
tually recur, but the recurrence time is so long that there 
is no possibility of ever observing it.^ But it is dramati- 
cally negative for the possibility to find analogues in high 
dimensional systems. 

In the case of ergodic dissipative systems, where the 
coarse-grained probabilities are ruled by the dimension 
Da (Cfr. Eq. (8)), Kac's result (15) applies with N re- 
placed by Da- 

We conclude this digression on Poincare recurrences 
by noting that the limitations to find the analogues set 
by relation (15) is unrelated to chaos. For instance, 
Eq. (15) also applies to a chain of n harmonic oscillators 
with incommensurable frequencies, a system with regular 
(quasiperiodic) behavior. Strictly speaking, such a sys- 
tem is not ergodic in the whole angle-action phase space, 
but in the space of angles only. Therefore in Eq. (15) 
instead oi N = 2n one has to set N = n.^^'^^ 



C. Consequence of Kac's Lemma 

The above results allow us to quantify Lorenz's pes- 
simism with respect to the number of data necessary 

for finding good analogues in the atmosphere. -'^'^ Clearly, 
we must require MAt > t^, which from (11) implies 
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> l/CA/(e). Then using (9), we can realize that the 
minimum length of the time series is 

M^(-) , (16) 

L being the typical excursion of each component of x. 

Equation (16) implies that, at least in principle, the 
method can work for deterministic systems having an 
attractor of finite dimension provided the time series is 
suitably long. However, the exponential dependence on 
Da in Eq. (16) imposes, upon putting the numbers, too 
severe constraints even if we content ourselves of a poor 
precision, i.e. not too small e-analogues. For instance 
in Fig. 3, we show how the distance between a reference 
point and its best analogue (emin) scales with M . We see 
that for e„iin/emax = 10~^ a sequence of 10^ points is suf- 
ficiently long in the case iV = 21 {Da ~ 3.1) while, on the 
contrary, even 10^ points are not yet enough in the case 

= 20 {Da ~ 6.6). Indeed by inverting (16), we should 
expect eniin oc AI~^^^^, as shown in Fig. 3. The differ- 
ences between the case iV = 21 and TV = 20 in Fig. 2 and 
3 are thus a mere consequence of the different attractor 
dimensionality, namely ^'^(^ = 21) < ^^(^ = 20). 
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FIG. 3: (Color online) emin/^max vs. M. The parameters of 
the model are the same as in Fig. 2: F = 5, A'' = 20 and 
A'' = 21; the reference states are r — 1000. The solid lines are 
the fits of the data by means of relation (16). 

Relation (16) also lays the basis for understanding 
the limits of the Grassberger and Procaccia method^"^ to 
compute the correlation dimension from the scaling be- 
havior of the correlation sum, or its approximation (6). 
In fact, it states that the larger the dimension of the at- 
tractor the larger the number of points M necessary to 
sample it within a given accuracy e. For example, a seg- 
ment of size L will require M ^ L/e points, M ~ {L/eY 
for a square and so on. Smith^'' proposed a minimum 
number of points of M ^ 42^-* (about a decade and a 
half of scaling region) to get reliable results. For Da = 5 
or 6, Smith's recipe requires from hundreds of millions 
to billions of data, too large for standard experiments. 
The above considerations on the limits of applicability 
of the Grassberger and Procaccia^"^ technique may sound 



trivial. However, in the '80s, when nonlinear time series 
analysis started to be massively employed in experimen- 
tal data analysis, the limitations due the length of the 
time series were overlooked by many researchers and a 
number of misleading papers appeared even in important 
journals (for a critical review see Ref. 9). 

In conclusion, the possibility to predict the future from 
the past using the analogues has its practical validity only 
for low-dimensional systems. More than one century af- 
ter, scientists working on prediction problems rediscov- 
ered Maxwell's warning: same antecedents never again 
concur, and nothing ever happens twice, whenever the 
system is moderately high dimensional. 



D. Remarks on the case of unknown phase space 

So far, we have assumed that the vector x determin- 
ing the state of the system is known and can be mea- 
sured with arbitrary precision. The real situation is less 
simple: usually, we do not know the whole set of vari- 
ables (not even their number) which define the state of a 
system. Moreover, even knowing them, in experimental 
measurements, we normally have access only to very few 
scalar observables Ut depending on the state of the sys- 
tem: ut — G[xt]. In these cases, there exists a powerful 
technique (based on Takens' delay embedding theorem^^) 
able to reconstruct the phase space, providing a rigor- 
ous ground to the use of the analogues. Beyond the 
technical (often non trivial) aspects, the main limit of 
the method, i.e. the exponential increase of M with Da, 
still remains. Moreover, in practical implementations, 
the presence of unavoidable measurement errors intro- 
duces a further source of complications. Ways to deal 
with the general case of phase-space reconstruction and 
measurement errors have been developed, however their 
discussion is beyond the scope of this paper, so we refer 
the reader to specialized monographs. ^''^ 



V. TWO EXAMPLES WHERE THE METHOD 
OF ANALOGUES WORKS 

Chaotic low dimensional attractors {Da ~2 — 4) may 
occur in many physical systems (such as electric circuits, 
lasers, fluid motion etc., see Ref. 3). Other natural phe- 
nomena, such as weather, are instead characterized by 
high dimensional attractors with Da proportional to the 
total number of variables involved, which is huge in the 
case of the atmosphere. Thus, the conclusions of the pre- 
vious section are very pessimistic: when Da is that large 
only mediocre analogues (rather large e) can be found 
and those are, from the point of view of predictability, 
usually not so informative about the future evolution of 
the system. -^^ 

It is instructive, however, to consider here two excep- 
tions to this rule, namely: a variation on the theme of 
the Lorenz-96 model (4) and briefly discuss tidal predic- 
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tions which represent, to the best of our knowledge, one 
of the few instances in which the idea of using the past to 
predict the future works and has still important practical 
applications. 



A. Systems with multiscale structure 

We consider here systems with a multiscale struc- 
ture, where the vector state x can be decomposed into 
a slow component X which is also the "largest" one, 
and a fast component y "small" with respect to X (i.e. 
2/rms ^ -^rms)- If the slow compouents can be described 
in terms of an "effective number" of degrees of freedom 
much smaller than those necessary to characterize the 
whole dynamics, mediocre (referred to the whole sys- 
tem) analogues can be used to forecast at least the slower 
evolving component. As an illustration of such kind of 
system we consider a variant of the model (4) introduced 
by Lorenz himself^" to discuss the predictability problem 
in the atmosphere, where indeed a multiscale structure 
is present. The model reads 



Xn-liXn+l—Xn-2)—Xn + F—— ^ yk^n (17) 



dt 

Ayk,n 



k=l 

he 



cbyk+l,n{yk-l,n-Vk+2,n)-Cyk,n+ -r^n (18) 

at b 

where n = 1, . . . , N and k = 1, . . . , K with boundary 
conditions Xn±„ = X±n, VK+i,n = J/i.n-t-i and yo.n = 
yK,n~i- Equation (17) is essentially (4) but for the last 
term which couples X io y. The variables y evolve with 
a similar dynamics but are c times faster and h times 
smaller in amplitude. The parameter h, set to 1, controls 
the coupling strength. 
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FIG. 4; (Color online) Cr,M(e) vs. e/emax for model (17-18) 
computed for three scale separations b (as labeled) holding 
the other parameters fixed at /i = 1, c = 10, -F = 10, A*" = 5 
and K = 10. The gray straight fine has slope ~ 3.1 while the 
dashed fines have aU the same slope ~ 9.8. The quantity (6) 
has been computed with r = 10'^ and M = 10^. 



We repeat the computation to measure the probabil- 
ity of e-analogues for the dynamics (17-18), by assuming 
that the whole state of the system x{t) = {X{t),y{t)) is 
accessible, and by ignoring which are the slow and fast 
variables, so that we must search for the analogues in the 
sequence of states Xk = {X{tk),y{tk)), with tk ~ kAt. 
Figure 4 shows Cr,M(e) as a function of e/emax for a long 
sequence, M = 10^, for fixed time scale separation c = 10 
and taking the fast component y respectively b = 20, 50 
and 100 times smaller than the slow one X. The phase- 
space dimensionality is 50, with N — 5 slow and K — 10 
fast degrees of freedom. The attractor dimension of the 
whole system Da, given by the scaling C(e) ^ e^"^ at 
very small e, is rather large (Da ~ 10). However, for 
e/emax > 0{l/b) we see a second power law C(e) ~ e^^ 
with « 3 < Da which defines a sort of "effective 
dimension at large scale" . 

Therefore, if we are interested in predicting the slow 
evolving component of the system, provided it is de- 
scribed by a relatively low number of effective degrees of 
freedom, as here, we can exploit the mediocre analogues 
(i.e. the e-analogues with e/emax > 0(1/6)). More- 
over, with reference to Eq. (2), it is reasonable to ex- 
pect that the prediction error related to mediocre ana- 
logues grows as ^ ee^^'^^'^ where A(e) can be much smaller 
than the Lyapunov exponent Ai (indeed as shown in 
Ref. 29 A(e) « Ai/c). This implies that slow variables 
can be predicted over longer term than the whole state 
of the system, as already realized by Lorenz. In gen- 
eral multiscale systems, increasing e amounts to per- 
form a coarse-graining on the system description, which 
implies the "elimination" of the fastest degrees of free- 
dom, associated to the smallest scales. Consequently, 
coarse-graining reduces the number of effective degrees 
of freedom {D°^{e) < Da) and the error growth rate 
(A(e)<Ai).^ 

The previous example is somehow the simplest multi- 
scale system, i.e. C(e) vs e shows only two logarithmic 
slopes, D°^ and Da- More generally one can have a loga- 
rithmic slope D{e) with a series of plateaux: D{e) ~ Dl^ 
for e e [en,ei], D{e) « Df > Df for e € [ei,e2], and 
so on (eo > ei > e2 . . . ). The interested reader may re- 
produce such a behavior by computing the correlation 
integral of the discrete-time system discussed in Ref. 30. 



B. Tidal prediction from past history 

Tidal prediction is a problem of obvious importance 
for navigation. The appropriate governing equations had 
been established since long time by Laplace: it is nec- 
essary to study the water level, with suitable boundary 
conditions, under the gravitational forcing of the Moon, 
the Sun and the Earth. '^^ Due to the practical difficulties 
in the treatment of boundary conditions (only partially 
known and hard to solve numerically) , even with powerful 
computers the fundamental equations cannot be directly 
used for tide forecasting. 
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However, and remarkably, already in the first half of 
the 19-th century there existed efficient empirical meth- 
ods to compile numerical tables of tides in any location 
where a record of past tides were known. As recognized 
by Laplace, a great simplification comes from the peri- 
odicity of the forcing (related to the motions of celestial 
bodies) that naturally suggests to treat tides in terms of 
Fourier series, whose frequencies are known from celestial 
mechanics. Lord Kelvin and George Darwin (Charles' 
son) showed that water levels can be well predicted by 
a limited number of harmonics (say 10 or 20), determin- 
ing the Fourier coefficients from the past time data at 
the location of interest. To make the numerical com- 
putations automatic minimizing the possibility of error, 
Kelvin and Darwin built a tide-predicting machine: a 
special-purpose mechanical computer made of gears and 
pulleys. Tide-predicting machines have been working till 
half century ago, when they were replaced by digital com- 
puters to compute the Fourier series. 

Since tides are chaotic, it is natural to wonder why 
their prediction from past records is a relatively easy 
task. One realizes that the reason of such a favorable cir- 
cumstance is the low number of effective degrees of free- 
dom involved. In a detailed description of tides also small 
scale phenomena are involved, with very short character- 
istic times, e.g. micro-turbulence; therefore the "true" 
Da is surely very large, together with Ai . Therefore, the 
success of tidal prediction is mainly a consequence of the 
multiscale character of the system, that has a small D^^ 
(and also a small A(e)) on the interesting not too small 
scales, in a way similar to the multiscale model of the pre- 
vious subsection. Indeed, quite recently, investigations'^^ 
of tidal time series by using the standard method of non- 
linear time series analysis (such as embedding etc, see 
Sect. IV D) found effective attractor dimensions quite low 
(between 3 and 4) with effective Lyapunov exponents of 
the order of 5 days~^. That explains a posteriori the 
success of the empirical method. Thanks to the low D^^, 
analogues can be found. Moreover, to forecast tides a 
few hours in advance, the relatively low value of the 
Lyapunov exponent makes the predictability time long 
enough for practical purposes. Of course, quantitative 
details (the precise values of D^*^ and of A(e)) depend 
on the locations, but for the method to work, the very 
important aspect is the limited value of the effective at- 
tractor dimension. 



VI. CONCLUSIONS 

It is a common belief that chaos is the main limiting 
factor to predictability in deterministic systems. This is 
correct as long as the evolution laws of the system under 
consideration are known. Conversely, if the information 
on the system evolution is only based on observational 
data, the bottleneck lies in Poincare recurrences which, 



in turn, depend on the number of effective degrees of free- 
dom involved. Indeed, even in the most optimistic con- 
ditions, if the state vector of the system would be known 
with arbitrary precision, the amount of data necessary to 
make the predictions meaningful would grow exponen- 
tially with the effective number of degrees of freedom, 
independently of the presence of chaos. However, when, 
as for tidal predictions, the number of degrees of freedom 
associated with the scales of interest is relatively small, 
future can be successfully predicted from past history. 

We stress that, the necessity of an exponentially large 
(with Da) amount of data constitutes a genuine intrinsic 
difficulty of every analysis based on time series without 
any guess on the underlying dynamics. Such a difficulty 
is not a peculiarity of the method of analogues, but is in- 
herent to all methods based on the occurrence frequency 
of sequences of states to estimate the average of observ- 
ables. In other words, the problem arises whenever one 
needs to collect enough recurrences. This obstacle may 
be partially overcome by suitable information-theoretic 
techniques (see, e.g., Ref. 36) allowing for optimized 
reconstructions of the dynamics, whose dimensionality, 
however, increases with the required accuracy. These 
conclusions are further supported by a recent work by 
Cubitt and coworkers^^, showing that the reconstruction 
of dynamical equations from data is a computationally 
NP-hard problem, as the needed observation time scales 
exponentially with the number of degrees of freedom. 

In general, the best strategy for meaningful predic- 
tion is that envisaged by Richardson, as a clever com- 
promise between modeling and data analysis. In this 
regard, we would like to conclude mentioning that, in 
the era of information technology, the enormous capac- 
ity of data storage, acqiiisition and elaboration may en- 
title someone to believe that meaningful predictions can 
be extracted merely from data. For example, recently 
the magazine Wired provocatively titled an article "The 
End of Theory: The Data Deluge Makes the Scientific 
Method Obsolete", asserting that nowadays, with the 
availability of massive data, the traditional way science 
progresses by hypothesizing, modeling, and testing is be- 
coming obsolete. In this respect, we believe that, while 
it is undeniable that the enormous amount of data poses 
new challenges, the role of modeling cannot be under- 
mined. When the number of effective degrees of freedom 
underlying a dynamical process is even moderately large, 
predictions based solely on observational data soon be- 
come problematic as it happens in weather forecasting. 
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